Term Importance Degree Impact on Search Result Clustering
نویسندگان
چکیده
As wellactual clustering algorithms have to deal with explosive growth of documents of various sizes and terms of various frequencies, an appropriate term-weighting scheme has a crucial impact on the overall performance of such systems. Term-weighting is one of the critical process for document retrieval and ranking in most search result clustering systems. In this paper we introduce a new technique forclustering algorithms that solve the problem of indexing the terms of big datasets and their characteristicswhich exist in most of current clustering approaches. The paper focus on term frequency normalization step ofclustering algorithms. Anew factor has been applied tobasic term-weighting schemes for using in clustering process. The evaluated results confirm the impact of this factor to increase the performance of clusteringtechniques. The experiments were carried out on the standard algorithms and ODP-239 datasets which validated by statistical tests.
منابع مشابه
Experiments in Document Clustering using Cluster Specific Term Weights
We study methods to initialize or bias different clustering methods using prior information about the “importance” of a keyword w.r.t. to the specific clusters. These studies give us hints on how to initialize clustering methods in order to improve the clustering performance if prior knowledge is available. This can be especially useful if a user-specific clustering of a document collection or ...
متن کاملEfficient Clustering Multiple Web Search Engine Results and Ranking
World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. Web search engines with effective and efficient techniques for Web service retrieval and selection becomes an important issue. Existing web search result based on keyword matching in single search engine only. This paper details a modular, self-contained web search results clustering system t...
متن کاملExperiments in Term Weighting and Keyword Extraction in Document Clustering
We study methods to initialize or bias different clustering methods using prior information about the “importance” of a keyword w.r.t. the whole document collection or a specific cluster. These studies give us hints on how to initialize clustering methods in order to improve performance if prior knowledge is available. This can be especially useful if a user-specific clustering of a document co...
متن کاملImproving Retrieval Performance with Positive and Negative Equivalence Classes of Terms
One of the most pressing problems facing application developers in the area of information retrieval (IR) is the lack of sound mathematical, theoretical frameworks for understanding IR [SIGIR2000]. Although many such frameworks have been proposed, in the final analysis none has been sufficiently well-grounded to attain widespread acceptance in the field. In addition, there is all too often a la...
متن کاملThe impact of network characteristics on the diffusion of innovations
This paper studies the influence of network topology on the speed and reach of new product diffusion. While previous research has focused on comparing network types, this paper explores explicitly the relationship between topology and measurements of diffusion effectiveness. We study simultaneously the effect of three network metrics: the average degree, the relative degree of social hubs (i.e....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014